Skip to content

fix: PgVectorDocumentStore _treat_meta_field and comparison functions now return Composed - string escaping done by psycopg#2964

Merged
davidsbatista merged 7 commits intomainfrom
fix/pgvector-sql-injection-v2
Mar 17, 2026
Merged

fix: PgVectorDocumentStore _treat_meta_field and comparison functions now return Composed - string escaping done by psycopg#2964
davidsbatista merged 7 commits intomainfrom
fix/pgvector-sql-injection-v2

Conversation

@davidsbatista
Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista commented Mar 16, 2026

Related Issues

Proposed Changes:

  • _treat_meta_field returns a psycopg.sql.Composed object and using psycopg.sql.Literal to embed the field name, so string escaping is now done by psycopg itself
  • filtering helpers (_equal, _not_equal, _greater_than, etc.) have been updated to accept and return Composed objects

How did you test it?

  • filtering tests updated
  • Added a test demonstrating the old string-interpolation approach was vulnerable and that the new approach safely escapes the injection attempt via SQLLiteral

Checklist

@github-actions github-actions Bot added integration:pgvector type:documentation Improvements or additions to documentation labels Mar 16, 2026
@davidsbatista davidsbatista marked this pull request as ready for review March 17, 2026 08:21
@davidsbatista davidsbatista requested a review from a team as a code owner March 17, 2026 08:21
@davidsbatista davidsbatista requested review from sjrl and removed request for a team March 17, 2026 08:21
@davidsbatista davidsbatista changed the title _treat_meta_field and comparison functions now return Composed fix: PgVectorDocumentStore _treat_meta_field and comparison functions now return Composed - string escaping done by psycopg Mar 17, 2026
@anakin87 anakin87 requested review from anakin87 and removed request for sjrl March 17, 2026 08:26
Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I left some comments

assert _treat_meta_field(field="meta.name", value=None) == "meta->>'name'"
assert _treat_meta_field(field="meta.other", value={"a": 3, "b": "example"}) == SQL("meta->>") + SQLLiteral("other")
assert _treat_meta_field(field="meta.empty_list", value=[]) == SQL("meta->>") + SQLLiteral("empty_list")
assert _treat_meta_field(field="meta.name", value=None) == SQL("meta->>") + SQLLiteral("name")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss tests: I understand that the return type has changed, so we need to do something like this, which, unfortunately, is hard to read/maintain.

An alternative would be to build a psycopg cursor and use it to convert SQL to str, keeping the test similar to the previous one. An example for test_treat_meta_field:

def test_treat_meta_field(document_store):
    document_store._ensure_db_setup()
    cursor = document_store._cursor

    expr = _treat_meta_field(field="meta.number", value=9)
    assert expr.as_string(cursor) == "(meta->>'number')::integer"
    ...

The only problem with this approach: the test is no longer a pure unit test since it needs a DB connection. But we keep readability. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this:

def test_treat_meta_field():
      cast_integer = SQL("(") + SQL("meta->>") + SQLLiteral("number") + SQL(")::integer")
      no_cast_name  = SQL("meta->>") + SQLLiteral("name")
      cast_real     = SQL("(") + SQL("meta->>") + SQLLiteral("number") + SQL(")::real")
      cast_boolean  = SQL("(") + SQL("meta->>") + SQLLiteral("bool") + SQL(")::boolean")

      assert _treat_meta_field(field="meta.number", value=9) == cast_integer
      assert _treat_meta_field(field="meta.number", value=[1, 2, 3]) == cast_integer
      assert _treat_meta_field(field="meta.name", value="my_name") == no_cast_name
      assert _treat_meta_field(field="meta.name", value=["my_name"]) == no_cast_name
      assert _treat_meta_field(field="meta.number", value=1.1) == cast_real
      assert _treat_meta_field(field="meta.number", value=[1.1, 2.2]) == cast_real
      assert _treat_meta_field(field="meta.bool", value=True) == cast_boolean
      assert _treat_meta_field(field="meta.bool", value=[True, False]) == cast_boolean

it compares two Composed instances

Copy link
Copy Markdown
Member

@anakin87 anakin87 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test remains quite understandable in any case.
test_comparison_condition_missing_operator test_logical_condition_nested is hard to read instead.

For this reason, I was advocating to resolve to string in the test.

What's your opinion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused.

To make sure I understand your point.

You agree with the approach for the test test_treat_meta_field as is, but instead of repr() you suggest having it as a string to compare the full output - is that it?

Regarding the test_comparison_condition_missing_operator- it's not changed by this PR - but I miss what you are suggesting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was pointing to the wrong test. It's test_logical_condition_nested that gets a bit hard to read.

If we want to resolve to a string for easier comparison, we can use a cursor as I proposed in #2964 (comment)

In any case, I'll leave this decision up to you.

Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Feel free to take #2964 (comment) into consideration or disregard it if you think that tests are sufficiently readable

@davidsbatista
Copy link
Copy Markdown
Contributor Author

Ah! I see what you mean now. I can fix the readability of that test as well.

@davidsbatista davidsbatista merged commit ed55ef1 into main Mar 17, 2026
8 checks passed
@davidsbatista davidsbatista deleted the fix/pgvector-sql-injection-v2 branch March 17, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:pgvector type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants